Exploring Large Rule Spaces by Sampling

نویسنده

  • Sergey Brin
چکیده

A great challenge for data mining techniques is the huge space of potential rules which can be generated. If there are tens of thousands of items, then potential rules involving three items number in the trillions. Traditional data mining techniques rely on downward-closed measures such as support to prune the space of rules. However, in many applications, such pruning techniques either do not su ciently reduce the space of rules, or they are overly restrictive. We propose a new solution to this problem, called Dynamic Data Mining (DDM). DDM foregoes the completeness o ered by traditional techniques based on downward-closed measures in favor of the ability to drill deep into the space of rules and provide the user with a better view of the structure present in a data set. Instead of a single determinstic run, DDM runs continuously, exploring more and more of the rule space. Instead of using a downward-closed measure such as support to guide its exploration, DDM uses a user-de ned measure called weight, which is not restricted to be downward closed. The exploration is guided by a heuristic called the Heavy Edge Property. The system incorporates user feedback by allowing weight to be rede ned dynamically. We test the system on a particularly di cult data set { the word usage in a large subset of the World Wide Web. We nd that Dynamic Data Mining is an e ective tool for mining such di cult data sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring the Relationships between Spatial and Demographic Parameters and Urban Water Consumption in Esfahan Using Association Rule Mining

In recent years, Iran has faced serious water scarcity and excessive use of water resources. Therefore, exploring the pattern of urban water consumption and the relationships between geographic and demographic parameters and water usage is an important requirement for effective management of water resources. In this study, association rule mining has been used to analyze the data of municipal w...

متن کامل

Dynamic Data Mining : Exploring Large Rule Spaces

A great challenge for data mining techniques is the huge space of potential rules which can be generated. If there are tens of thousands of items, then potential rules involving three items number in the trillions. Traditional data mining techniques rely on downward-closed measures such as support to prune the space of rules. However, in many applications, such pruning techniques either do not ...

متن کامل

Exploring the Patterns of In-Between Spaces in Guilan Historical Houses

The aim of this paper is to explain the spatial patterns of in-between spaces in Guilan historical houses in order to show their potential capacity in having various functions, and thus different forms, in the course of history. In-between spaces are mediators between two other spaces making them accessible or visible for each other. An explanation of their spatial patterns can both reveal the ...

متن کامل

Hit-and-Run for Sampling and Planning in Non-Convex Spaces

We propose the Hit-and-Run algorithm for planning and sampling problems in nonconvex spaces. For sampling, we show the first analysis of the Hit-and-Run algorithm in non-convex spaces and show that it mixes fast as long as certain smoothness conditions are satisfied. In particular, our analysis reveals an intriguing connection between fast mixing and the existence of smooth measurepreserving ma...

متن کامل

Constraint-Based Preferences via Utility Hyper-Graphs

Real-world decisions involve preferences that are nonlinear and often defined over multiple and interdependent issues. Such scenarios are known to be challenging, especially in strategic encounters between agents having distinct constraints and preferences. In this case, reaching an agreement becomes more difficult as the search space and the complexity of the problem grow. In this paper, we pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999